Information on data:

The following data is on New Orleans tornado building damage during December 2022. This data was obtained from Verisk Analytics and it was derived computer vision and machine learning using post-catastrophe aerial imagry data. There are approximately 42,000 buildings in this dataset.


Here are some before and after photos of three buildings

This is totally a building bro; 100


This is totally another building bro 100


This is totally another another building bro catastrophescore fo 60

Clean data:

I converted roof_solar into a T/F statement, by converting “SOLAR PANEL” to TRUE and “NO SOLAR PANEL” to FALSE. In addition to this, I converted the roof shapes that the computer wasn’t very sure about (up to a 20% chance of being incorrect) into NA. There were some cells in damage_level where they were filled with an empty character, so I converted that into NA as well. I then separated longitude and latitude so that it could be easily read into leaflet.

df <- read.csv("clean_data.csv") %>% 
  janitor::clean_names() %>% 
  mutate(roofsolar = case_when(roofsolar == "SOLAR PANEL" ~ TRUE)) %>%
  mutate(roofshape = ifelse(roofshascr < 0.80, NA, roofshape)) %>%
  select(-c(roofshascr, roofcondit_discolordetect, roofcondit_discolorscore, roofcondit_discolorpercen, trampscr, roofcondit_tarppercen))

df$rooftopgeo <- gsub("POINT \\(|\\)", "", df$rooftopgeo)

df <- df %>%
  separate(rooftopgeo, into = c("long", "lat"), sep = " ", convert = TRUE)

df$damage_level <- ifelse(df$damage_level == "", NA, df$damage_level)
df$roofshape <- factor(df$roofshape, levels = c("gable", "hip", "flat"))
levels_roofmateri <- c("metal", "shingle", "membrane", "shake", "tile")
df$roofmateri <- factor(df$roofmateri, levels = c("gravel", levels_roofmateri))
df$roofmateri <- factor(df$roofmateri, levels = levels_roofmateri)

Define damage categories:

Catastrophe scores are separated by the summary of the dataset, excluding the catastrophe scores of 0.

mostdamage <- df %>% filter(catastrophescore >= 50)
nodamage <- df %>% filter(catastrophescore == 0)
decimated <-df %>% filter(catastrophescore == 100)
middamage <- df %>% filter(catastrophescore < 50 & catastrophescore >= 15)
leastdamage <- df %>% filter(catastrophescore < 15 & catastrophescore >= 2)
minimaldamage <- df %>% filter(catastrophescore == 1)

Function for adding labels to the maps:

create_popup <- function(data) {
  paste("<b>Location</b><br>",
        "&nbsp;&nbsp;&nbsp;Longitude: ", data$long, "<br>",
        "&nbsp;&nbsp;&nbsp;Latitude: ", data$lat, "<br>",
        "<b>Catastrophe Score</b><br>",
        "&nbsp;&nbsp;&nbsp;Score: ", data$catastrophescore, "<br>",
        "<b>Roof Shape</b><br>",
        "&nbsp;&nbsp;&nbsp;Shape: ", str_to_title(data$roofshape), "<br>",
        "<b>Roof Material</b><br>",
        "&nbsp;&nbsp;&nbsp;Material: ", str_to_title(data$roofmateri), "<br>")
}

r_med_sq_err <- function(model, absolute = FALSE){ # adding in the option for an absolute error
  if(sum(class(model) %in% c("glm","lm")) > 0){
    if(absolute == TRUE){
      median(abs(residuals(model)))
    }
    sqrt(median(residuals(model)^2))
  } else {
    if(class(model) == "list"){
      stop("Did you provide a list of models? Use map() instead.")
    }
    stop("'model' must be either a glm or lm object.")
  }
}

Damage maps:

This shows all of the damage points, the vast majority of roofs have no damage.


These are the buildings that sustained damage

Red indicates the buildings that were the most damaged (catastrophe score >= 50), orange indicates (25 < catastrophe score < 50), blue indicates (catastrophe score <= 25, excluding scores of 0). The majority of the buildings (3852) exhibited a catastrophe score of 0.

Map of the buildings that experienced damage:

Map of the buildings that experienced the most damage:

Map of the buildings that experienced mid damage:

Map of the buildings that experienced the least damage:

Map of the buildings that experienced no damage:

Map of the buildings that experienced no damage and the most damage:


Graphs:


Models:

Since most of the buildings in this dataset were not damaged by a tornado, the summary of the catastrophe scores of each building is skewed. This can be seen below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.217   0.000 100.000

## # Comparison of Model Performance Indices
## 
## Name | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |   RMSE |  Sigma
## --------------------------------------------------------------------------------------------
## mod1 |   glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## mod2 |   glm | 3.2e+05 (<.001) | 3.2e+05 (<.001) | 3.2e+05 (<.001) | 0.019 | 11.454 | 11.455
## mod3 |   glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## mod4 |   glm | 10240.8 (>.999) | 10240.9 (>.999) | 10277.4 (>.999) | 0.020 | 10.110 | 10.132
## mod5 |   glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.006 | 10.764 | 10.765
## mod6 |   glm | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 2.8e+05 (<.001) | 0.007 | 10.780 | 10.781
## [1] 11.55021
## [1] 10.76414

Check models: Extra

Due to this, I made models that excluded the catastrophe scores of 0 to just look into the structures that experienced damage. Below is the summary for the structures that exhibited damage:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    4.00   15.00   28.64   46.00  100.00

Models

## # Comparison of Model Performance Indices
## 
## Name  | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |   RMSE |  Sigma
## ---------------------------------------------------------------------------------------------
## mods1 |   glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods2 |   glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods3 |   glm | 26094.1 (>.999) | 26094.1 (>.999) | 26147.3 (>.999) | 0.147 | 27.816 | 27.857
## mods4 |   glm | 30925.4 (<.001) | 30925.5 (<.001) | 30968.0 (<.001) | 0.156 | 28.795 | 28.821
## mods5 |   glm | 30773.9 (<.001) | 30773.9 (<.001) | 30828.6 (<.001) | 0.176 | 28.402 | 28.437

Out of the models I made, Model 5 appeared to work best. Though it should be noted that none of these models fit particularly well based on the variables used.

Model 5

## 
## Call:
## glm(formula = catastrophescore ~ long + roofmateri + rooftree + 
##     enclosure, family = gaussian(link = "identity"), data = extra)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -75.62  -18.17   -8.69   13.16   82.23  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4460.13497 1149.62714   3.880 0.000107 ***
## long                 49.03688   12.76374   3.842 0.000124 ***
## roofmaterishingle   -23.29005    1.43088 -16.277  < 2e-16 ***
## roofmaterimembrane   12.42319    2.18262   5.692 1.37e-08 ***
## roofmaterishake     -21.90493    4.73851  -4.623 3.94e-06 ***
## roofmateritile      -22.84290    7.71403  -2.961 0.003087 ** 
## rooftree              0.56774    0.06876   8.257  < 2e-16 ***
## enclosureTRUE        44.80193   10.78228   4.155 3.34e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 808.6766)
## 
##     Null deviance: 3157376  on 3226  degrees of freedom
## Residual deviance: 2603130  on 3219  degrees of freedom
##   (10 observations deleted due to missingness)
## AIC: 30774
## 
## Number of Fisher Scoring iterations: 2
##                GVIF Df GVIF^(1/(2*Df))
## long       1.010340  1        1.005157
## roofmateri 1.020779  4        1.002574
## rooftree   1.012753  1        1.006356
## enclosure  1.004155  1        1.002076

Root mean squared error for Model 4

## [1] 28.59962

Predictions:

Based on Model 5, I have made model predictions:

I then plotted the predicted catastrophe scores alongside the actual catastrophe scores for reference.


Interpretations of predictions and models used:

The variables included in this dataset were shown to not be entirely helpful in predicting catastrophe scores accuarately, which is exemplified in the graph above.